Today we will…
Your grade reflects the completeness of your submission, not the correctness!
Code:
echo: false.R functions you have used (“We used str_detect to…”.).
NA from per_cap_gdp.”Citations:
Style + Organization:
Any good analysis should include a check of the “adequacy of the fit of the model to the data and the plausibility of the model…” – Andrew Gelman
Predictive checks allow us to assess if our fitted model would produce data similar to the data that we observed.
This is an assessment of model fit.
Caution
Predictive checks are not aimed to make predictions of the response variable for new observations of the explanatory variable.
For simple linear regression, we assume the responses can be modeled as a linear function of the explanatory variable and some error.
\[y = \beta_0 + \beta_1 x_1 + \varepsilon\]
We also assume that those errors \((\varepsilon)\) follow a normal distribution with mean 0 and standard deviation \(\sigma\).
\[\varepsilon \sim N(0, \sigma)\]
Therefore, the data we would expect to come from this model can be generated by:
and
This method produces data that perfectly agree with the linear model conditions:
Linear relationship between \(x\) and \(y\).
Independence of observations.
Normality of residuals.
Equal variance of residuals.
If we compare data generated from the linear model to the observed data, we can determine how well the observed data and linear model fit.
To perform a predictive check…
Fit a regression model to the observed data.
For a set of explanatory values, obtain predicted response values from the model.
Add random errors to the predictions.
Compare the simulated data to the observed data.
Iterate!
To perform a predictive check…
Use the lm() function…
To perform a predictive check…
Fit a regression model to the observed data.
For a set of explanatory values, obtain predicted response values from the model.
Use the predict() function…
To perform a predictive check…
Fit a regression model to the observed data.
For a set of explanatory values, obtain predicted response values from the model.
Add random errors to the predictions.
Use the rnorm() function…
The random errors have mean 0 and standard deviation estimated by the residual standard error (use sigma()).
To perform a predictive check…
Fit a regression model to the observed data.
For a set of explanatory values, obtain predicted response values from the model.
Add random errors to the predictions.
Compare the simulated data to the observed data.
Use the lm() function to regress observed on simulated…
To measure similarity, record \(R^2\) (proportion of variability in \(y\) explained by a linear relationship with \(x\)).
To perform a predictive check…
Fit a regression model to the observed data.
For a set of explanatory values, obtain predicted response values from the model.
Add random errors to the predictions.
Compare the simulated data to the observed data.
Iterate!
Use the map() function to repeat the process over and over…
We want to see how the model performs across many simulated datasets.
Instead of \(R^2\), could use correlation \((r)\), sum of squared errors \((SSE)\), or the estimate of \(\sigma\) \((RMSE)\) to measure similarity.
Plot the distribution of simulated \(R^2\) values to see how well the model performs.
For your group project, you will run predictive checks to assess how well your model performs.
Game Plan Survey
Course Evaluation
Final Project Report
Final Exam – Thursday 3/21
Today we will…
Think about the readability of the numbers you are presenting.
Include units on your plots!
If you do any transformations, make sure you mention them.
The exam is cumulative and will definitely contain questions on:
dplyr and tidyr.ggplot.map,lm.Game Plan Survey
Course Evaluation
Final Project Report
Final Exam – Thursday 3/21